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Abstract 



An important problem in the implementation of Markov Chain Monte Carlo algorithms is to deter- 
mine the convergence time, or the number of iterations before the chain is close to stationarity. For many 
Markov chains used in practice this time is not known. Even in cases where the convergence time is 
known to be polynomial, the theoretical bounds are often too crude to be practical. Thus, practition- 
C/j ' ers like to carry out some form of statistical analysis in order to assess convergence. This has led to 

^^ . the development of a number of methods known as convergence diagnostics which attempt to diagnose 

^ ' whether the Markov chain is far from stationarity. We study the problem of testing convergence in the 

O . following settings and prove that the problem is hard in a computational sense; 

• Given a Markov chain that mixes rapidly, it is hard for Statistical Zero Knowledge (SZK-hard) to 
distinguish whether starting from a given state, the chain is close to stationarity by time t or far 
from stationarity at time ct for a constant c. We show the problem is in AM intersect coAM. 



> 

OO . • Given a Markov chain that mixes rapidly it is coNP-hard to distinguish whether it is close to 



stationarity by time t or far from stationarity at time ct for a constant c. The problem is in coAM. 
It is PSPACE-complete to distinguish whether the Markov chain is close to stationarity by time t 



^— V . or far from being mixed at time ct for c > 1 

O' 



1 Introduction 



Markov Chain Monte Carlo (MCMC) simulations are an important tool for sampling from high dimensional 
5^ I distributions in Bayesian inference, computational physics and biology and in applications such as image 

processing. An important problem that arises in the implementation is that if bounds on the convergence 
time are not known or impractical for simulation then one would like a method for determining if the chain 
is still far from converged. 

A number of techniques are known to theoretically bound the rate of convergence time as measured by 
the mixing time of a Markov chain, see e.g. lUldllO. These have been applied with to problems such 
as volume estimation |[T5l . Monte Carlo integration of log-concave functions |[T6l . approximate counting 
of matchings |[T3l and estimation of partition functions from physics |[T2l . However, in most practical 
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applications of MCMC, there are no effective bounds on the convergence time so for example it may not 
be known if a chain on 2^"^*^ states mixes in time 1000 or 2^^. Even in the cases where rapid mixing is 
known, the bounds are often impractical since they are not tight especially since applications usually require 
multiple independent samples. 

As a result, practitioners have focused on the development of a large variety of statistical methods, called 
convergence diagnostics which try to determine whether the Markov chain is far from stationarity (see e.g. 
surveys by lfT0l l4ll6ll5ir7l[T7l). A majority of practitioners of the MCMC method run multiple diagnostics to 
test if the chains have converged. The two most popularly used public domain diagnostic software packages 
are CODA and BOA |[T8l l3l. The idea behind many of the methods is to use the samples from the empirical 
distribution obtained when running one or multiple copies of the chain, possibly from multiple starting states 
to compute various functional and identify non-convergence. 

While diagnostics are commonly used for MCMC, it has been repeatedly observed that they cannot guaran- 
tee convergence, see e.g. ||6l|4l|2l. 

Here we formalize convergence to stationarity detection as an algorithmic problem and study its complexity 
in terms of the size of the description of the Markov chain, denoted by n. Our main contribution is showing 
that even in cases where the mixing time of the chain is known to be bounded by n'^' for some large C, the 
problem of distinguishing whether a Markov chain is close to or far from stationarity at time n"^ for c much 
smaller than C is "computationally hard". In other words under standard assumptions in computational 
complexity the problem of distinguishing whether the chain is close to or far from stationarity cannot be 
solved in time n^ for any constant D. 

The strength of our results is in their generality as they apply to all possible diagnostics and in the weakness 
of the assumption - in particular in assuming that the mixing time of the chain is not too long and that the 
diagnostic is also given the initial state of the chain. 

From the point of view of theoretical computer science, our results highlight the role of Statistical Zero 
Knowledge, AM, coAM and coNP in the computational study of MCMC. 

2 Results 

We begin by defining the mixing time which measures the rate of convergence to the stationary distribution. 
Recall that the variation distance (or statistical distance) between two probability distributions /i and u on 
state space il is given by dtv{fJ-, v) = \ Z^c^ef^ Im('^) ~ ^('^)l- 

Definition 1 (Mixing time) Let M be a Markov chain with state space Q., transition matrix P and a unique 
stationary distribution vr. The following measure of distance to stationarity will be convenient to define: 

d{t):= maxdt.{P\x,-),P\y,-))- 

The e-mixing time is defined to be, 

T{e) := min{t : d(t) < e}. 

We refer to t(1/4) as the mixing time. We also define the e-mixing time starting from x: 

Tx{e) := min{t : dtv{P\x,-),TT) < e}. 

We note that Tx{£) < T{e)for all x. 

To formulate the problem, we think of the Markov chain as a "rule" for determining the next state of the 
chain given the current state and some randomness. 



Definition 2 We say that a circuit C : {0, 1}" x {0, l}"* — )• {0, 1}" specifies P if for every pair of states 

x,y en, Pr^^|o,i}m [C(j;,r) = y] = P{x,y). 

In this formalization, x is the "current state", r is the "randomness", y is the "next state" and C is the "rule". 
Next we formalize the notion of "Testing convergence". We imagine the practitioner has a time t in mind 
that she would like to run the Markov chain algorithm for. She would like to use the diagnostic to determine 
whether at time t: 

• The chain is (say) within 1/4 variation distance of stationarity 

• or at least at distance 1/4 away from it. 

Requiring the diagnostic to determine the total variation at time t exactly is not needed in many situations. 
Many practitioners will be happy with a diagnostic which will 

• Declare the chain has mixed if it is within 1/8 variation distance of stationarity at time t. 

• Declare it did not mix if it is at least at distance 1/2 away from it at time t. 
An even weaker requirement for the diagnostic is to: 

• Declare the chain has mixed if it is within 1/8 variation distance of stationarity at time t. 

• Declare it did not mix if it is at least at distance 1/2 away from it at time at, where c > 1. 

Thus in the last formulation, the practitioner is satisfied with an approximate output of the diagnostics both 
in terms the time and in terms of the total variation distance. This is the problem we will study. In fact, 
we will make the requirement from the diagnostic even easier by providing it with a (coiTcct) bound on the 
actual mixing time of the chain. This bound will be denoted by tmax- 

In realistic settings it is natural to measure the running time of the diagnostics in relation to the running 
time of the chain itself as well as to the size of the chain. In particular it is natural to consider diagnostics 
that would run for time that is polynomial in t and tmax- The standard way to formalize such a requirement 
is to insist that the inputs t, tmax to the diagnostic algorithm to be given in unaiy form (note that if t, tmax 
were specified as binary numbers, an efficient algorithm would be required to run in time poly-logarithmic 
in these parameters, a much stronger requirement). We continue with description of the different diagnostic 
problems and the statement of the hardness results. 

2.1 Given Starting Point 

The discussion above motivates the definition of the first problem below. Assume that we had a diagnostic 
algorithm. As input, it would take the the tuple (C, x, 1*, l*'^^^), i.e., a description of the circuit which de- 
scribes the moves of the Markov chain, an initial starting state for the chain, and the times t and tmax. which 
are specified as unary numbers. The following theorems show that a diagnostic algorithm as described 
above is unlikely to exist under standard complexity-theoretic assumptions. We consider two versions of 
the convergence testing problem, one where the starting state of the Markov chain is specified (GapPoly 
TestConvergenceWithStarTc) and the other where it is arbitrary (GapPolyTestConvergencEc): 

Problem: GapPolyTestConvergenceWithStarTc^^ (GPTCSc,<5). 

Input: (C, X, 1*, l*max")^ where C is a circuit specifying a Markov chain P on state space (7 C {0, 1}", 



X e fl, and t, tmax £ N. 

Promise: The Markov chain P is ergodic and t(1/4) < tmax- 

YES instances: t^{1/4- 6) <t. 

NO instances: Tx{1/4: + 6) > ct. 

Informally the input to this problem is the MC rule C, a starting state x, and times t, tmax- It is promised 
that the chain mixes by time tmax- The expectation from the diagnostic is to: 

• Declare the chain has mixed if it is within 1/4 — 6 variation distance of stationarity at time t. 

• Declare it did not mix if it is at least at distance 1/4 + 5 away from it at time ct, where c > 1. 

Note again that the diagnostic is given room for error both in terms of the total variation distance and in 
terms of the time. 

The following theorem refers to the complexity class SZK, which is the class of all promise problems that 
have statistical zero-knowledge proofs with completeness 2/3 and soundness 1/3. It is believed that these 
problems cannot be solved in polynomial time. 

Theorem 1 Let c > 1. 

• For < 5 < 1/4, GPTCSc,5 is in AM n coAM. 

• For ^^%i^ = .116025.. <S< 1/4, GPTCSc,^ is in SZK. 

• Let0<6< 1/4. For 

''max 1 

c < —. — m 



4t \1 + 4S 
GPTCSc,5 is SZK-hard. 

The most interesting part of the theorem is the last part which informally says that the problem GPTCSc,^ 
is SZK-hard. In other words, solving it in polynomial time will result in solving all the problems in SZK in 
polynomial time. The second part of the theorem states that for some values of 6 this is the "exact" level of 
hardness. The first part of the theorem states that without restrictions on 6 the problem belongs to the class 
AM n coAM (which contains the class SZK). The classes AM and coAM respectively contain the classes 
NP and coNP and it is believed that they are equal to them, but this is as yet unproven. 
The restriction on the constant 5 in the second part of the result comes from the fact that the proof is by re- 
duction to the SZK-complete problem STATISTICAL DISTANCE (SD, see Section[3]for precise definitions). 
Holenstein and Renner give evidence in ||9l that SD is in SZK only when there is a lower bound on the 
gap between the completeness and soundness. We show that the restriction in Theorem [U necessary since 
otherwise it would be possible to put SD in SZK for a smaller value of the completeness-soundness gap. 
On the other hand, we can show a slightly weaker result and put GPTCSc,5 into AM n coAM without any 
restrictions on 5. To show this, we first prove that SD is in AM n coAM when no restriction is put on the 
gap between the completeness and soundness. This result may be interesting in its own right as it involves 
showing protocols for STATISTICAL DISTANCE that are new, to our knowledge. 

2.2 Arbitrary Starting Point 

So far we have discussed mixing from a given starting point. A desired property of a Markov chain is fast 
mixing from an arbitrary starting point. Intuitively, this problem is harder than the previous one since it 



involves all starting points. This is consistent with our result below where we obtain a stronger hardness. 

Problem: GapPolyTestConvergencEc,^ (GPTCc,^). 

Input: (C, X, 1*, l*m='>=), where C is a circuit specifying a Markov chain P on state space il C {0, 1}", 

X G $7 and t, t^ax £ N. 

Promise: The Markov chain P is ergodic and t(1/4) < tmax- 

YES instances: r(l/4 - 5) < t. 

NO instances: t{1/4 + 6) > ct. 

Note that the only difference between this and the previous problem is that the total variation distance is 

measured from the worst starting point instead of from a given starting point. 

Theorem 2 Let c > 1. 

• For0<6 < 1/4, GPTCc,5 G coAM. 

• Let0<6< 1/4. For 

3/4.-5 



C < \/tmax/t^n3 

it is coNP-hard to decide GPTCc^^. 

Again the second part of the theorem is the more interesting part. It shows that the diagnostic problem is 
coNP hard so it is very unlikely to be solved in polynomial time. This hardness is stronger than SZK- 
hardness because SZK is unlikely to contain coNP-hard problems. If it did, this would imply that NP = 
coNP since SZK C AM and it is believed that AM = NP. The first part of the theorem shows that the 
problem is always in coAM. 

2.3 Arbitrary mixing times 

Finally we remove the restriction that the running time of the algorithm should be polynomial in the times 
t, ^max- This con^esponds to situations where the mixing time of the chain may be exponentially large in the 
size of the rule defining the chain. This rules out many situations of practical interest. However it is relevant 
in scenarios where analysis of the mixing time is of theoretical interest. For example there is an extensive 
research in theoretical physics on the rate of convergence of Gibbs samplers on spin glasses even in cases 
where the convergence rate is very slow (see ||8l and follow up work). In such setups it is natural to define 
the problem as follows: 

Problem GapTestConvergencEc,^ (GTCc,5). 

Input: (C, X, t), where C is a circuit specifying a Markov chain P on state space il C {0, 1}", x G fi and 

tGN. 

Promise: The Markov chain P is ergodic. 

YES instances: r(l/4 -5) <t. 

NO instances: r ( 1/4 + 5) > ct. 

Note that the main difference is that in this problem the time t is given in binary representation. Thus, 
informally in this case the efficiency is measured with respect to the logarithm of t. Additionally note 
that the mixing time of the chain itself does not put any restrictions on the diagnostic. We then prove the 
following result: 



Theorem 3 Letl<c< exp(n'^(^)). 

• For exp(-n'^(^)) < (^ < 1/4 it is in PSPACE to decide GTCc,5. 

• LetQ <5 < 1/4, then, it is VSVkCE-hard to decide GTCc,5. 

It is known that PSPACE hard problems are at least as hard as all the problem in polynomial time coNP, 
NP and all other problems in the polynomial hierarchy. 

3 Protocols for statistical distance 

Given a circuit C: {0, 1}" — ^ {0, 1}", the probability distribution pc associated to C assigns probability 
p{ijj) = |C^^(a;)|/2" to every oj G {0, 1}". We will be interested in estimating the statistical distance be- 
tween the distributions associated to a pair of circuits C, C" : {0, 1}" — )■ {0, 1}". Denote those distributions 
by p and p' , respectively. 

For a pair of constants < s < c < 1, SDc,s is defined to be the following promise problem. The inputs are 
pairs of circuits C, C : {0, 1}" — )• {0, 1}", the YES instances satisfy dtv{p,p') > c, and the NO instances 
satisfy dtv{p,p') < s. 

Sahai and Vadhan ll20l show that for every pair of constants c, s the problem SDc,s is SZK-hard. They also 
show that when c^ > s, 5Dc,s is in SZK. Our theorem yields a weaker conclusion, but covers a wider 
spectrum of parameters. 

Theorem 4 For any pair of constants < s < c < 1, SDc^s i^ in AM n coAM. 

3.1 An AM protocol 

The following interactive protocol P for SDc^s essentially appears in ll20l but we rewrite it here for the 
precise parameters we need: 

V: Flip a fair coin. If heads, generate a random sample from C. If tails, generate a random sample from 
C. Send the sample x to the prover. 

P: Say if x came from C or from C. 

V: If prover is correct accept, otherwise reject. 

Claim 1 Protocol P is an interactive proof for SDc^s with completeness 1/2 + c and soundness 1/2 + s. 

Proof: We prove soundness first. Let T be the set of xs which the prover claims came from C. The 
accepting probability is 

y ^+y ^ = l(y p(x) + y p\x)). 

No matter what T is, we have that 

i(^p(x) + Y^p'ix)) = i(l - Y.p{x) + ^p'(x)) < 1/2 + dUp.p'), 

xeT xt^T x-^T x^T 

and so the accepting probability is at most 1/2 + s. 

To prove completeness, notice that the above inequality is tight when T equals the set of those x such that 
p{x) > p'{x). So when the prover uses this strategy (say C if p(x) > p'{x) and C otherwise), the accepting 
probability becomes exactly 1/2 + dtv{p,p') > 1/2 + c. ■ 



3.2 A coAM protocol 

Showing that 5'I?c,s is in coAM is a bit more involved. Such a protocol wants to accept when the statistical 
distance between p and p' is small, and reject when the statistical distance is large. To develop some intuition, 
let us first attempt to distinguish the cases when p and p' are the same distribution (i.e. s = 0) and the case 
when they are at some distance from one another (say c = 1/2). 

Let's forget for a moment that the verifier has to run in polynomial time. Suppose the verifier could get hold 
of the values 

N{t) = \\{u}: \C-'^{u})\ >tand|C"-^(a;)| > t}|| 

for every t (which could potentially range between and 2"). Then it can compute the desired statistical 
distance via the following identity which will be proven later: 

2" 

Y, t ■ (Nit) - Nit + 1)) = (1 - dUp,p')) ■ 2\ (1) 

i=l 

If we want the verifier to run in polynomial time, there are two issues with this strategy: First, the verifier 
does not have time to compute the values N{t) and second, the verifier cannot evaluate the exponentially 
long summation in ([T]). If we only want to compute the statistical distance approximately, the second issue 
can be resolved by quantization: Instead of computing the sum on the left for all the values of t, the verifier 
chooses a small number of representative values and estimates the sum approximately. For the first issue, the 
verifier will rely on the prover to provide (approximate) values for N{t). While the verifier cannot make sure 
that the values provided by a (cheating) prover will be exact, she will be able to ensure that the prover never 
grossly over-estimates the sum on the left by running a variant of the Goldwasser-Sipser protocol which we 
describe below. Since the sum on the left is proportional to one minus the statistical distance, it will follow 
that no matter what the prover's strategy is, he cannot force the verifier to significantly underestimate the 
statistical distance without being detected. 

We now give the details of this protocol, starting with a proof of (dJ. 
Proof of identity ffl: Let f{Lo) = mm{\C-\co)\,\C'-\Lo)\}. Then 

2" 2" 

we{o,i}" t=i i=i 

The right-hand side of this expression is exactly equal to the left-hand side of ([T]l- For the left-hand size, 
using the formula min{a, 6} = (a + b)/2 — \a — b\/2 (where a, 6 > 0) we have 



H /M = ^ E {\C^\u:)\+\C'^\u)\)-\ E \\C~\u;)\-\C'~\u;)\\=2--dt.{p,p'yT 
ije{o,i}" aje{o,i}" i.je{o,i}" 

which equals the right-hand side of ([T])- ■ 



A lower bound protocol for N{t) We now show that a variant of the Goldwasser-Sipser lower bound 
protocol can be used to certify lower bounds on the quantities N{t). More precisely, we design an AM 
protocol for the following problem: 

Input: A pair of circuits C, C" : {0, 1}" -^ {0, 1}", a number 1 < t < 2", a target number < A < 2", 

and a fraction < 5 < 1 (represented in unary). 

Yes instances: (C, C", t, N, 6) such that N{t) > N 

No instances: (C, C", t, N, 6) such that A((l - 6)t) < (1 - 5)N. 

7 



Here is a protocol for this problem. Here, 5i , 82 are the largest values below 5 that make the logarithms 
below integers. In the analysis, for simplicity we will assume that 61 = 62 = S. 

V: Set a = log((5f iV/54). Send a random hash function g : {0, 1}" -^ {0, 1}". 

P: Let c = [(1 - 5i/2)(54/<5f)J. Send a set of values {wi, . . .,ujc}- 

V: Set b = log((5|t/5000). Send a random hash function /i: {0, 1}" -^ {0, 1}''. 

P: Let d=[{l- (52/2) (5000/ J^)J. For each 1 < i < c, send sets {rn, . . . ,rid} and {r^^, . . .,r'-J. 

V: If g{uj^i) = for all i and h{rij) = h{r[-) = and C{rij) = C'{r[-) = uji for all pairs {i,j), accept, 
otherwise reject. 

We first prove completeness: If (C, C , t, N, 6) is a yes instance, the protocol accepts with probability at 
least 2/3. Let 

S = {uj: \C^\uj)\ >t and \C"^\uj)\ >t}. 

The expected number ofcoGS with g{uj) = is at least (54/6^) ■ {N{t)/N). lfN{t) > N, by Chebyshev's 
inequality, the probability over g of getting fewer than c = (1 — 5) (54/5^ ) such WjS is at most 1/6. Assuming 
all these ujiS exist, let's fix one of them. We now look at the set Ti = {r: C{r) = uji}. Since uji G 5, T 
has size at least t, so the expected number r ^ Ti such that h{r) = is at least 5000/5^. By Chebyshev's 
inequality, the probability of getting fewer than d such r^jS is at most 5^/1248. This bound holds for every 
i and also for the sets T[ = {r: C'{r) = a;j}. Taking a union bound over all 2c such sets we get that with 
probability at least 5/6 over the choice of h, a sufficient number of r^jS and r^ s exist for all values of i, so 
the verifier accepts. 

We now prove soundness: If (C, C ,t, N, 6) is a no instance, the protocol accepts with probability at most 
2/3. Now let 

S = {uj: \C-\uj)\ > (1 - 6)t and \C'-\uj)\ > (1 - 6)t}. 

The expected number of a; € S with g{uj) = is then at most (1 — 5)(54/5^). In this case, c is at least equal 
to (1 + 6/3) times this expected value. By Chebyshev's inequality, the probability that there exist c such a;jS 
is then less than 1/6. If not, then the prover is forced to send at least one toi such that either g{iOi) 7^ or 
uJi S. In the first case, the verifier rejects. In the second case, we let 

T, = {r: C{r) = u,} and Tl = {r: C'{r) = u,} 

so either \Ti\ < (1 — 6)t or |T/| < (1 — 6)t. Without loss of generality, let us assume the first case. Then the 
expected number of r G T such that h{r) = is at most (1 — 5) (5000/5^ ). We apply Chebyshev's inequality 
again to conclude that with probability at least 5/6, the prover is then forced to send some r^j such that either 
h{rij) 7^ or C{rij) ^ Ui. Thus the verifier accepts with probability at most 1/6 + 1/6 < 1/3. 
Repeating this protocol in parallel sufficiently many times, we have the following consequence, which we 
wiU use below: 

Claim 2 There is an AM lower bound protocol for N{t) with completeness 1 — (5/20n and soundness 

5/20n. 



A coAM protocol for statistical distance We now give the coAM protocol for statistical distance. We 
begin with the observation that it is sufficient to handle the following special case of the problem: 

Input: A pair of circuits C, C" : {0, 1}" — )• {0, 1} and a fraction < 6 < 1/3 (represented in unary). 
Yes instances: (C, C", S) such that dtv{p,p') < S 
No instances: (C, C", 5) such that dtv{p,p') > 35. 

We can reduce SDc^g for any pair of constants 0<s<c<lto the above problem via the XOR lemma of 
Sahai and Vadhan ll20l . which reduces 5Dc,s to SD^k gk for an arbitrary constant k. When k is chosen so 
that {c/s)^ > 3, the resulting instance can be handled by our protocol. 
We now give the protocol for statistical distance: 

P: Send claims Ni for the values Ni = iV((l - 5)^'), 0<i< en/5. 

P, V: Run the AM lower bound protocol for Ni on inputs (C, C", (1 — 5)"' , iVj, 6) for every 1 < i < en/ 6. 
If all of them pass accept, otherwise reject. 

V: Accept if E'=o^(^* " Ni+i){l - 5)"* > (1 - 6f ■ 2". 
The soundness and completeness rely on the following approximation, which is a quantized version of O: 

en/5 en/S 

Y,{N^ - N,+,){1 - 6r < (1 - dUp,p'))r <J2{N,- iv:m)(i - <5)"^*+'^- (2) 

i=0 i=0 

This is proved in a similar way as ([T]). For every i, we have the sandwiching inequality 

which yields ([2]), after summing over all i from to en/ 6. 

To prove completeness, consider an honest prover which claims Ni = Ni for all i. By Claim |2] and a union 

bound, with probability at least 2/3 none of the lower bounds protocols for Ni reject. In this case, using Q, 

we get 

en/5 

Y,{Ni - iV,+i)(l - 5)-' > (1 - 5){l - dUp,p')) • 2^ 

i=0 

establishing completeness. To prove soundness, assume now that the verifier accepts with probability at 
least 1/3. By the soundness of the lower bound protocols for Ni (Claim O and a union bound, there must 
exist at least one setting of the randomness of the verifier for which Ni^i > {1 — 6)Ni for all i (where 
A^_i = No) and the verifier accepts. Now (using the fact that the last value of N is zero): 

en/5 en/5—1 

5^(iv.-iv.+i)(i-5)- = ^+ Yl iv^((i-5)-(^+^)-(i-5r)) 

i=-l 4=0 

en/5—1 

>No+ Y. (l-W+i((l-5)-(*+i)-(l-5)-^)) 

i=0 

en/5 

= 5No + (1 - 5) • ^(iV, - iVi+i)(l - 6)-' 

i=0 

> (1 - 6f • 2^^ 
so from © we get that 1 - dtv{p,p') > (1 - Sf, so dti,{p,p') < I - {I - Sf < 36. 



4 Diagnosing Convergence for Polynomially Mixing Chains 

The results of this section imply that even if the mixing time is restricted to being polynomial the diagnostic 
problem remains hard. The two cases we consider ai^e the worst case start mixing time and the mixing time 
from a given starting state. Both hardness results are by reduction from a complete problem in the respective 
classes. We first prove Theorem[T] 

Lemma 1 The problem GPTCSc,^ is in SZK/or allc>l and ^I^ = .116025... <5< 1/4. 

Proof: The proof is by reduction to SDc.s where c and s are chosen as follows. Choose k large enough 
such that 

1 . 1\^ 1 . 1 



Let 



and 



A-'-k>l-'^k- ® 



1 r 1 



\^'-\- 



Suppose we are given an instance of GPTCSc,5 with input (C, x, 1*, l*max') ^gj- ^ _ r(l//c) be the time to 
come within 1/k in variation distance of the stationary distribution. Let C output the distribution P^{x, •) 
over 0. Let C output the distribution -P^(x, •) over Q.. In the YES case, 



while in the NO case, 



Since c > 1, this implies that 



\p\x,.)-P^{x,-)\<\-5 + \ 



\p-\x,.)-pr{x,-)\>\ + 5-\. 



\p\x,.)-P^{xr)\>\ + 6-\. 



By (O, the constructed instance of SDc,s is in SZK and the lemma follows. ■ 

Lemma 2 The problem GPTCSc,^ is in AM n coAM/or allc>l and <6 < 1/4. 

This part of the result follows directly from Theorem|4]by reducing GPTCSc,^ to SDc.s as above, without 
the restriction on the gap between c and s. We can show that the gap for 6 in Lemma [T] is required for 
membership in SZK. Sahai and Vadhan |[20l show that when c^ > s, SDc,s is in SZK. Holenstein and 
Renner 191 show that this condition on the gap between c and s is in fact essential for membership in SZK. 

Proposition 1 There exist c, s satisfying c^ < s < c and c such that if there is an SZK protocol for an 
instance o/GPTCSc,5 with a sufficiently small 6, then there is an SZK protocol for SDc,s- 

Proof: The proof is by reduction from SDc,s to GPTCSc,^. Let (C, C) be an instance of SDc,s where 
C and C' are circuits which output distributions /xi and fi2 over {0, 1}". Construct the Markov chain P, 
whose state space is [m] x {0, 1}" where m = p{n) is a polynomial in n. The transitions of the chain are 
defined as follows. Let the current state be {Xt, Yt) where Xt G [m] and Yt G {0, 1}"^. 
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• If Xt = 1, choose Yt+i according to ^i. 

• If Xt = 2, choose Yt+i according to ^2- 

• Otherwise, set y^+i = ^t- 

• Choose Xf^i uniformly at random from [m]. 

The stationary distribution of the chain is given by 7r(z,2/) = ;^(^/ii(y) + ^/U2(2/))- Take the starting state 
to be X = (1,0"). In one step, the total variation distance from stationary can be bounded as 

dtv{P{x,-),7r) = -dtv{fJ.i,fi2) 



-.t , X ^. / . I'm — 2\ \ ,1 1 . fm — 2^ 

J'' I 



For t > 1, we have 

P*(.,-) = f/Hx(l-(^^J J ^2^^^ + >) + i^J ^^ (4) 

Hence, it can be verified that 

dtv{P\x, ■),7r) = - [ dto(/ii,/i2) (5) 

2 \ m J 

Let < 5 < (\/5/2 - l)/2, s = 1/2 - 26 and c = 1/2 + 26 so that c^ < s < c. Set c = 1, i = 1 and 

'-max — ITT" 

In the YES case, dto(/xi, /i2) < s and hence after one step, 

1 1 
dtviPix,-),TT) <-s<--5 (6) 

In the NO case, dti,{fj.i, /i2) > c and after one step, 

dt,{P{x,-),7r)>^c>^ + 6. (7) 

From ^ it can be seen that in both cases, t(1/4) < m = imax- This completes the reduction since if there 
is an SZK protocol for GPTCSc,^ with the above parameters, then it can be used to distinguish the YES and 
NO case of SDc,s for the above values of c, s. ■ 

We now complete the proof of Theorem [T] 

Lemma 3 Let0<6< 1/4. Forl<c< ^ In (j^), the problem GPTCSc,5 is SZK-hard. 

Proof: The proof uses the same reduction as in Proposition [T] from SDc,s- We recall that 

1 /m - 2\^^^ 
dtv{P\x,-),7T) = -[ dtvijJ^i, 112) 

2 \ m 
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Choose m > 3. Set s = 1/4 — 5 and c = 1. Note that c^ > s. In the YES case, (iti,(/ii, 1^2) < s and hence 
for any t > 1, 

d,,{P\x,-),7r)<h<^-6 (8) 

In the NO case, dti,{fii, /i2) > c and hence 

dUP'-M..) > i ('^"\> i (!!t^)"" . (9) 

1 \ m J 1 \ m J 

Since m > 3, if ct < ^ hi (jq^ ) , then dtv{P''\x, •), vr) > i + (5. Further, we see that in both the YES 
and NO case, 

r(l/4) < m (10) 

We conclude the reduction by setting tmax = w-- B 

Next we prove Theorem [2] and classify the complexity of diagnosing mixing from an arbitrary starting state 
given that the chain mixes in polynomial time. We will use the following result relating mixing time to the 
conductance. 

Definition 3 (Conductance, see e.g. II19I ) Let M be a Markov chain corresponding to the random walk on 
an edge weighted graph with edge weights {we}- Let dx denote the weighted degree of a vertex x. Define 
the conductance of M to be <I>(M) := min0^^cn ^a{M) where 



Wx 



7 ^ '^xy 

^a{M) := -^^^^^ (11) 

2^dx 
xeA 

Theorem 5 (see B19I ) Let M be a Markov chain corresponding to the random walk on an edge weighted 
graph with edge weights {we} as above. Let vr be the stationary distribution of the Markov chain. 

2 / 2 

'^(^) ^ ^^2777^ log 



$2(M) ^VvTmine 

where iTmin is the minimum stationary probability of any vertex. 

Lemma 4 For every c>l,0<5<l/4, GPTCc,^ is in coAM. 

Proof: In the first step of the coAM protocol for GPTCc,5 the prover sends a pair x, y G Q that maximizes 
dti,{P^{x, ■),P^{y, •)). Let Cx be the cicuit which outputs the distribution P^{x, •) and let Cy output the 
distribution P^{y,-). 

In the YES case t(1/4 - 6) < t and for every x, y, dtv{P^{x, •), P*(y, •)) < 1/4 - S. In the NO case, 
r(l/4 + 6) > at and c > 1, therefore there must exist x, y such that dtv{P^{x, •), -P*(y, •)) > 1/4 + ^■ 
By Claimdlthere is an AM protocol P for SDi/4_|_5 i/4_5 with completeness 3/A + 6 and soundness 3/4 — 6. 
The prover and the verifier now engage in the AM protocol to distinguish whether the distance between the 
two distributions is large or small. The completeness and soundness follow from those of the protocol P. ■ 
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Lemma 5 Let0<6< 1/4. Fori <c< l/2^tmax/i^"-3(3/4 - 6), it is colSiF-hard to decide GPTCc,5. 

Proof: The proof is by reduction from UnSAT, which is coNP hard. Let ip be an instance of UnS AT, that 
is, a CNF formula on n variables. The vertices of the Markov chain are the vertices of the hypercube H, 
V{H) = {0, 1}" and edges E{H) = {(2/1,2/2) : |yi — 2/2! = 1}- We set edge weights for the Markov chain 
as follows. Let d be a parameter to be chosen later which is at most a constant. 

• For each edge in E{H) set the weight to be 1. 

• If ^(y) = add a self loop of weight n at y. 

• If ^(y) = 1 add a self loop of weight n'^ at y. 

In the YES case, if ij) is unsatisfiable, the Markov chain is just the random walk on the hypercube with 
probability 1/2 of self loop at each vertex and it is well known that 

t(1/4-5) <Csn\ogn 

where Cs is a constant depending on (1/4 — 5)"^ polynomially. 

In the NO case, where ip is satisfiable, we will lower bound the time to couple from a satisfying state y and 
the state y, obtained by flipping all the bits of y. Consider the distributions X{t), Y{t) of the chain which 
are started at y and at y. We can bound the variation distance after t steps as follows 

d{t) > 1 - P[3s < t s.t. X{s) / y] - P[3s < t s.t. Y{s) = y] 

In each step, the chain started at y has chance at most l/(?i'^^^ + 1) of leaving. On the other hand, the 
probabihty that the walk started from y hits y in time t is exponentially small. Therefore 

d{t) > 1 - 2t/{n'^-^ + 1) 

which implies that 

r(l/4 + 5) > -71^^-1(3/4-5). 

Choose d to be a large enough constant (which may depend polynomially on 5^^), such that 

In-^-i (3/4 -5)> cCsn log n. (12) 

On the other hand we can show a polynomial upper bound on the mixing time by bounding the conductance 
as follows. Let M' be the Markov chain which is the random walk on the hypercube with self loop probabil- 
ities of 1/2 (where the edge weights are as in the case where tp is unsatisfiable). We bound the conductance 
of M by showing it is not too much smaller than the conductance of M'. We use the fact that for any vertex 
X, the weighted degree d^ < (n'^-i + l)d'^. Let A C V{H). 



Yl '^^y Yli '^'^v > "'- 



V w' 

, , a;eA,j/eA<= a;eA,j/eA'= xeA,j/eA'= <^a(-^^') . 

^a{.M) = — -^- = — -^- > -— ^^ ^^ , > ,_. , ., > 



Aen Aen A&n 

where we are assuming the lower bound on the conductance of the hypercube is ^? 
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We can lower bound 7rmj„ by l/(n2" ^ + n'^2") and hence we have for large enough n, 

log(7rmm)""^ < 2n 

and hence by Theorem^ t(1/4) < 32n'^'^+^. 

The reduction can be completed by setting x = 0", the vector of all O's, t^^x = 32n^'^+^ and t = Csn log n. 

By (IT2I) . we see that ct < Xjl^t^^-^jn^i^il^ — S) as required. ■ 

5 Estimating Mixing Time for Arbitrary Markov Chains 

In this section we prove Theorem |3l saying that the problem of testing convergence is PSPACE-complete. 
The idea of the hardness result is to simulate any PSPACE computation by a Markov chain so that if there is 
an accepting computation, the chain mixes quickly, while if the computation does not accept then the chain 
takes much longer to mix. 
We also recall some standard complexity theory background. 

Definition 4 A problem L is in PSPACE if there exists a Turing machine M which on input x of size n uses 
a work tape with at most a polynomial p{n) number of bits and outputs M{x) = 1 iff x & L. 

Definition 5 A problem Li is polynomial time reducible to another problem L2 if there exists a polynomial 
time computable function (i.e. there is a polynomial time TM which computes the output of f) f : {0, 1}* — )■ 
{0, 1}* such that x £ Li iff x G L-^. 

Definition 6 A problem L is PSPACE-hard if any A G PSPACE is polynomial time reducible to it. 

Definition 7 A problem L is in i?Pff PSPACE if there is a probabilistic polynomial space Turing machine 
M which on input x can flip any number of coins that is a bounded function in \x\ (but can only store 
polynomially many of them), halts for every setting of the tosses, and satisfies 

• IfxeL then Pr{M{x) = 1) > 2/3. 

• Ifx^L then Pr{M{x) = 1) < 1/3. 

The following result can be deduced from Savitch's Theorem fTH . 

Tlieorem 6 BPhPSFACE = PSPACE 

The proof of Theorem[3]now follows by the following two lemmas. The following lemma uses the fact that 
the t-step transition probabilities of a Markov chain can be approximated in BPhSPACE (see for example 
11211 '). We include all the details here for completeness. 

Lemma 6 For every 1 < c < exp(n '^') and exp(— n^^^) < 5 < 1/4, the problem GTCc,<5 is in 
BPhPSFACE. 

Proof: The proof is by showing that there is a randomized algorithm A for GTCc,5 with 2-sided error using 
at most a polynomial amount of space and exponentially many random bits. In particular, the algorithm A 
on input X = {C,x,t) queries C at most exponentially many times and 
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• If r(l/4 -5)<t, P{A{X, r) = 1) > 2/3. 

• If t(1/4 + S)> ct, P{A{X, r) = 1) < 1/3. 

We show below an algorithm to calculate a 6 additive approximation d{t) to d{t) with probability at least 
2/3. The algorithm accepts if d{t) < 1/4 and rejects otherwise. 

In the YES case, d{t) < 1/4 — 6 and therefore with probability at least 2/3, d{t) < 1/4 and the algorithm 
will accept. In the NO case, d{ct) > 1/4 + 6. Since the distance d is non-increasing (see e.g. Chapter 4 in 
lUl), d{t) > 1/4 + 6. Therefore, with probability at least 2/3, d{t) > j and the algorithm will reject. 

The algorithm to compute d{t) is as follows. Note first that it is possible to enumerate over all elements 
of the state space of the chain Q, using at most a polynomial amount of space. It is enough to check for 
each state whether it is reachable from x which can be done in PSPACE once we can enumerate all the 
adjacencies y for a vertex v. But this can be done in PSPACE by running over all possible random strings 
and checking if for some r, C{v, r) = y. 

For X (^ Q the algorithm runs the chain for t steps N times, starting at x each time, and sets /^ ^ to be 
the fraction of times the chain stops at z. The estimates fx,z and fy^z can be computed with a polynomial 
amount of space in this way. Let 

^xy ~ 2 Z-^ \jx,z — Jy,z\- 
z£n 

M* can be computed with a polynomial amount of space by running over all z. Let 

d{t) = maxM* . 

x,y ^ 

There are two sources of error in the estimate for P*(x, z). The first is due to using only a polynomial 
amount of space, whereas the t-step probabilities may be doubly exponentially small. The size of the eiTor 
is inversely exponential in the space we use. This (additive) error can be bounded by 6a = S/A using 
a polynomial amount of space since 6 is always at least exp(— ?i'^(^)). The second source of en^or 6r is 
random and can be bounded by (5/4 by Chernoff bounds for an overall error of at most 5/2. Thus, if the 
number of runs N is at least 48n(5~^, by Chernoff bounds, 

P{\P\x,z)-fx,z\>S/2)<2-^^-\ 

Therefore, for every x, y, taking union over all z, 

P{\Mly - dUP\x,.),P\yr))\ >S)< 2-2-3"-2 < 2-2"-2. 

Therefore, we have 



P(|d(t) - d{t)\ >5)< P{3 x,y s.t. \Miy - dUP\x, •), P\y, •))! > '^) < 7 



where the last inequality follows by taking the union over all x, y. 

Lemma 7 For every 1 < c < exp(n<^(^)) and^<5< 1/4, it is PSFACE-hard to decide GTC^ 

Remark 1 In fact, the conclusion holds even if the Markov chain is restricted to be reversible. 
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Proof: We construct a polynomial time reduction from any A 6 PSPACE to GTCc,5. Equivalently, a 

poly-time computable function / which will map strings y £ Ato strings that are YES instances for GTCc,5 

and y ^ Ato NO instances of GTCc 5. 

Since A G PSPACE, there a polynomial n{m) > m and a Turing machine M^ which on input y of size m 

uses at most n = n{m) space and accepts if and only if y G A. 

For each y G {0, 1}™, we define f{y) = (C, x, t) as follows. 

The state space Q. of the Markov chain P is the set of all possible configurations of the machine Ma with 

input y G {0, 1}™. Since the machine uses space n, the state space of the Mai^kov chain is a subset of 

{0,1}". 

Without loss of generality let s start be the starting state corresponding to the input y, and without loss of 

generality, let Sacc and Srej be the unique accept and reject states of the machine. In the case where either 

Sacc or Srej are never reached, they will be added to the state space. The Markov chain P is a reversible 

random walk defined by setting edge weights as follows. There will be two types of weights: 1 and w where 

1000Z?3c23"^ 

• There will be a single edge of weight 1 connecting Srej and Sacc- 

• For each pair of states that are connected by a single step of the machine they will be connected by a 
single edge of weight w. 

• There will be an edge of weight w connecting Sj-ej to s start- 

• There will be a loop of weight w connecting each state to itself. 

A key role in the proof will be played by the graph G of all states of the machine connected by edges of 
weight w corresponding to a single step of the machine. For any state of the machine, the number of vertices 
connected to it by a transition of Ma is at most a constant denoted D — 2 (depending on the finite number 
of states of the machine and a constant number of bits on the tape.) This implies in particular that the graph 
G is of bounded degree D. 

The circuit G will specify this Markov chain. For a polynomial time reduction, we require that the descrip- 
tion of the circuit is at most polynomial in m. Since c < exp(n'^(^)), and m < n, all the probabilities of 
the Mai^kov chain can be specified by polynomially many bits in n and hence polynomially many bits in m. 
Secondly, because the TM reads and writes to only a small number of bits, we only have to check for a few 
vertices v whether there is an edge from u to v, and this can be done with a polynomial sized circuit. 
Next, we show bounds on the mixing time with the edge weights as defined above. For this we observe the 
following. 

Claim 3 In the YES case the graph G is connected and r(l/4 - 6) < 10I?^2^"/(1 - A5). 

Proof: Note that by the assumption on the Turing machine, all states are connected by w edges to either 
Sacc or Srej- Since Srej is connected by a iD-edge to s start and s start is connected to Sacc since we are in the 
YES case it follows that the graph G is connected. 

We now use the conductance bound on the mixing time from Theorem [5] in the following way. For every 
set but the empty set or the complete graph, there is weight at least w from the set to its complement. 
Furthermore - the total weight of each set is at most D2'^w. Therefore the conductance <l> > L'~^2^" and 
hence we conclude that the mixing time r(e) is at most 

22^222" iog(2/vr„,i„e). 
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TTmin IS the minimum probability of any state in the space which we can lower bound by w/{D2^w) 
D~^2~^ and e = 1/4 — 5. The proof follows since 

log 2 < 1, log(l/7r^in) < \og{D2^) < D2^, log(l/e) = log(l/(l/4 - 5)) < 4/(1 - 46). 



Claim 4 In the NO case the graph G is not connected. Moreover d{t) > 1 — 2t/wfor all t and 

r(l/4 + (5) >r(l/2) >w/A. 

Proof: We first note that the bound t(1/4+(5) > r(l/2) > t(;/4 immediately follows from (i(t) > l-2t/w. 

In order to show that the graph is not connected we note that s start and Sacc are not in the same component. 

This follows from the fact that all edges of G are legal transitions of the machine. The only other edges of 

the Markov chain are loops or the edge connecting Srej to Sgtart- Consider in the graph G the component of 

Sstart and of Sacc denoted by C start and Gacc respectively. 

In order to bound d{t) we look at the distributions X{t), Y{t) of the chain started at Sstart and at Sacc- Note 

that 

d{t) > 1 - P[3s < t S.t. X{s) E Gacc] - P[^S < t S.t. Y{s) E Cstart] 

We note that the only way to move between the components is by following the edge 1 weight and the 
probability of taking this edge at any step (conditioned on the past) is at most w~^. It therefore follows that: 



d{t) > 1 - 2t/w, 



as needed. 

By the claims [3] and HI 



'»^/4 > 4(1-4^-) ^ 

101)323^/(1 - 46) - iog'23" - • 



1-4(5 

To complete the reduction,let t = 10D323'Y(1 — 46) and set the starting state x = Sstart- 

■ 
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